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0 Transferring data In a digital data processing system. 



0 A system and method for transferring data from 
a first storage medium to a second storage medium, 
each of the storage media being divided into cor- 
responding data blocks, the method comprising 
steps of: (a) reading data stored in a first data block 
in the first storage medium, the first data block 
initially constituting a current data block; (b) compar- 
ing data read in the current data block to data stored 
in a corresponding data block in the second storage 
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medium: (c).if the data compared in step b are 
identical, reading data stored in a different data 
block in the first storage medium, the different data 
block becoming the current data block, and returning 
to step b; (d) modifying the data stored in one of the 
storage media such that the data in the current data 
block is identical to the corresponding data in the 
second storage medium; and (e) rereading the data 
in the current data block and returning to step b. 
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TRANSFERRING DATA IN A DIGITAL DATA PROCESSING SYSTEM 



BACKGROUND OF THE INVENTION 

This invention relates to a device for transfer- 
ring digital data between two storage devtces m a 
digital data processing system. The 
bodiment is described in connection w.th a system 
for establishing and maintaining o";; "^^^J^^. 
plicate or "shadow" copies of stored data to there 
by improve the availability of the stored data 

A typical digital computer system mcludes one 
or more mass storage subsystems 'o^^torung data 
(which may include program instructions) to be 
processed. In typical mass storage subsys^ms^ 
the data is actually stored on disks. Disks are 
S ided into a plurality of tracks, at selected rad-al 
distances from the center, and sectors definmg 
particular angular regions across each track. w.th 
each track and set of one or more sectors compns- 
inq a block, in which data is stored. 

Since stored data may be unintentionally cor- 
rupted or destroyed, systems have been developed 
that create multiple copies of stored data, usually 
on separate storage devices, so that if the data on 
one of the copies is damaged, it can be recovered 
from one or more of the - ,n a 
multiple copies are known as a "sf^^^^;.^^; '"^ 
Shadow set. typically data that is stored m particu- 
lar blocks on one member of the shadow set is the 
same as data stored in corresponding blocks on 
the other members of the shadow set. It ,s usual y 
desirable to permit multiple host processors to 
Simultaneously access (i.e.. in parallel) the shadow 
set for read and write type requests ( I'O re- 
quests). . . . . 

A new storage device or "new member is 
occasionally added to the shadow set. For exam- 
ple, it may be desirable to increase the number of 
Shadow set members to improve the data availabil- 
ity or it may be necessary to replace a .shadow set 
member that was damaged. Because all shadow 
set members contain the same data, when adding 
a new member, all of the data stored on the active 
members is copied to the new member. 



Summary of t_he Invention 

The invention generally features a system and 
method for transferring data from a first storage 
medium to a second storage medium and is used 
in the preferred embodiment to copy data from an 
active member of a shadow set to a new shadow 
set member. In the preferred embodiment, the first 
storage medium is available to one or more hosts 



(or 1/0 operations, and each of the storage media 
are divided into corresponding data blocks, tne 
method generally including the steps of: (a) reading 
data stored in a first data block In the first storage 
5 medium, the first data block initially constituting a 
current data block; (b) comparing data read in the 
current data block to data stored in a correspond- 
ino data block in the second storage medium; (c) if 
the data compared in step b are identical, reading 
,0 data stored in a different data block in the first 
storage medium, the different data block becoming 
the current data block, and returning to step b; (d) 
If the data compared in step b are not identical, 
transferring the data stored in the current data 
,5 block to a corresponding data block in the second 
storage medium; and (e) rereading the data m the 
current data block and returning to step b. 

In the preferred embodiment, the diHerent data 
block is a data block adjacent to the current data 
20 block. Each data block in the first storage medium 
is compared to a corresponding data block in the 
second storage medium. Each of the storage me- 
dia may be directly accessed by one or nriore host 
processors. The storage media may be disk stor- 

25 age devices. . 

The invention allows data to be copied from 
one storage media to a second storage media 
without interrupting I/O operations from one or 
more hosts to the shadow set. Therefore a shadow- 
30 ing system can be maintained that provides maxi- 
mum availability to data with no interruption to 
routine I/O operations while providing consistent 
and correct results. 

Other advantages and features of the invention 
35 will be apparent from the following detailed de- 
scription of the invention and the appended claims. 
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pp^^^^iption t_he Preferred Embodiments 



Drawings 

We first briefly describe the drawings. 
Rg. 1 is a shadow set storage system according 

to the present invention. 

Figs. 2-4 are data structures used with the in- 

Flg.T is a flow chart illustrating the method 
employed by the invention. 



Structure and Operation 
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ing the invention includes a plurality of hosts 9 
each of which includes a processor 10. memon^ 12 
(includ ng buffer storage) and a commun.cat.ons 
Sace'l4. The hosts 9 are each directly con- 
nected through a comrpunicafons -^f ^"^^^^ 
by a virtual circuit) to two or more storage sub 
systems illustrated generally at 17 (Wvo are shown^. 

Each storage subsystem includes a d.sk con- 
troller 18. that controls I/O requests to one or more 
disks 20. which form the mer^bers the shadow 
set Disk controller 18 includes a buffer 22. a 
processor 24 and memory 26 (e.g.. volatile menj- 
ory). Processor 24 receives 1/0 requests from hosts 
9 and controls reads from and writes to disk 20. 
Buffer 22 temporarily stores data received in con- 
nection with a write command before the data .s 
written to a disk 20. Buffer 22 also stores data read 
from a disk 20 before the data is transm.tted to the 
host in response to a read command. Processor 24 
stores various types of information in memory 26. 

Each host 9 will store, in its memory 1^. a 
table that includes information about the system 
that the hosts 9 need to perform many operations. 
For example, hosts 9 will perform read and write 
operations to storage subsystems 17 and mus 
know which storage subsystems are available or 
use what disks are stored in the subsystems, etc. 
AS will be described in greater detail below the 
hosts 9 will slightly alter the procedure for read and 
write operations if data is being transferred (rorn 
one shadow set member to another. Therefore, the 
table will store status information regarding any 
ongoing operations. The table also contains other 
standard information. 

While each storage subsystem may include 
multiple disks 20. "the shadow set members are 
chosen to be disks in different storage subsystems. 
Therefore, hosts do not access two shadow set 
members through the same disk controller. This 
will avoid a -central point of failure." In other 
words, if the shadow set members have a common 
or central controller, and that controller malfunc- 
tions, the hosts will not be able to successfully 
perform any I/O operations. In the preferred sys- 
tem, however, the shadow set members are 
-distributed", and the failure of one device (e.g.. 
one disk controller 18) will not inhibit I/O operations 
because they can be performed using another 
shadow set member accessed through another disk 
controller. 

When a host wishes to write data to the shad- 
ow set the host issues a command whose format 
is illustrated in Fig. 2A. The command includes a 
-command reference number" field that uniquely 
identifies the command, and a "unit number" field 



write command for each disk that makes up the 
shadow set. using the proper unit number. The 
opcode field identifies that the operation is a write. 
The "byte count" field gives the total number of 
s bytes contained in the data to be written and the 
"logical block number" identifies the starting loca- 
tion on the disk. The "buffer descriptor" ident.fies 
the location in host memory 12 that contains the 

data to be written. 

,0 The format of a read command .s illustrated m 
Fig 2B and includes fields that are similar to the 
write command fields. For a read command, the 
buffer descriptor contains the location in host mem- 
ory 12 to which the data read from the disk is to be 

15 stored. ^ .. 

Once a host transmits a read or write com- 
mand, it is received by the disk controller 18 that 
serves the disk identified in the "unit number fie d. 
For a write command, the disk controller will .mple- 
20 ment the write to its disk 20 and reUjrn an "end 
message" to the originating host, the format of the 
write command end message being illustrated .n 
Fig 3A. The end message includes a status field 
that informs the host whether or not the coiinmand 
25 was completed successfully. If the write failed the 
status field can include error information, depend- 
ing on the nature of the failure. The "first bad 
block- field indicates the address of a first block on 
the disk that is damaged (if any). 
30 For a read command, the disk controller will 
read the requested data from its disk and transrTi.t 
the data to memory 12 of the originating host. An 
end message is also generated by the disk contro - 
ler after a read command and sent to the ongmat- 
35 ing host, the format of the read command end 
message being illustrated m Fig. 3B. The reaa 
command end message is similar to the end mes- 
sage for the write command. 

As will be explained below, the system utilizes 
^ a -Compare Host" operation when transfernng data 
between two shadow set members. The command 
message format for the Compare Host operation .8 
shown in Fig. 4A. The Compare Host opera^«n 
instructs the disk controller supporting the d sk 
45 identified in the "unit number" field to compare the 
data stored in a section of host memory identified 
in the -buffer descriptor" field, to the data stored 
on the disk in the location identified by the logical 
block number- and "byte count" fields. 
so The disk controller receiving the Compare Host 
command will execute the requested operation by 
reading the identified data from host memory 
reading the data from the identified section of the 
; k and comparing the data read ^^orn^^^^^^ 
55 the data read from the disk. The disk controller 
Then issues an end message, the format of which .s 
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end message will indicate whether the compared 
data was found to be identical. 

When adding a new member to the shadow 
(ie a new disk 20). the system chooses a host 
orocessor to carry out the processing necessary to 
J " e l new member with all of the data sto-d 
in the active members of the s^^«d°^,^«^;7J;%7^ 
.1,1 choose one active member to "n^ -?" .^^ ^ 
-source- and will sequentially copy all of the data 
• ,om the ource that differs from data in the new 
Member, to the new member or "target." Us.ng he 
.,ethod of the invention, data is ^^^^^^^l^'l'^l 
new member without interrupung ^^^^^^^t ^^^ 
ations between other hosts and the shadow set^ 
while assuring that any changes to data m the 
shadow set made during the copy operaUon w.ll 
propagate to the new member. 
^ specifically, the method of transfernng data to 
a new member or target from a source --^^ 
sequentially reading data blocks a duster at a 
time (a cluster is a predetermmed numbe of data 
blocks), from the source and comparing the read 
data to data stored at corresponding locatons .n 
te t^get. if the data is identical in the two cor- 
responding clusters, then the next cluster .s pro- 
cessed- Othe^ise. the data read from the source 
•s written to the target, and the host performs a 
Similar comparison operation on the ^^^^J^J^l 
once aoain The second comparison on the same 
Se^d data is necessary because the shadow 
set is available for I/O operations to other bos s 
during the process of adding a "^w member. In 
other words, while the new member .s be.ng ad- 
ded 1/0 operations such as write operations from 
hosts in the system will continue to the shadow set^ 
Read commands are performed to any act ve 
member, but not to the target since not all of U^e 
target's data is identical to the shadow set data. 
Since it is possible for a write operation to occur 
after the system reads data from the source and 
before it writes the data to the target, .t .s possible 
that obsolete data will be written into the target 
le if the data in the shadow set is updated just 
before the host writes data to the target, the data 
written to the target will be obsolete. 

Therefore, the system will perform a second 
compare operation after it writes data to the target, 
to the same two corresponding data clusters. In 
this way. if the source has been updated, and the 
target was inadvertently written with obsolete data, 
the second comparison will detect the difference 
arid the write operation will be repeated to prov^e 
the updated data to the target. Only after the two 
clusters are found to have identical data does the 
process move to the next data cluster. 

It is theoretically possible for the same cluster 
to be compared and written many times. For exani- 
pte if there was a particularly oft-wr.tten file or data 



block that was being changed "nstantlyjhe 
source data cluster and target data cluster wouW 
always be inconsistent. To prevent the system from 
repeating the comparison and writing steps in an 
5 infinite loop, a "repeat cluster counter .s used to 
determine how many times the loop has been 

executed. . 

This counter is initialized to zero and is incre- 
mented after data is written from the source to the 
,0 target. The counter is monitored to determine it 
reaches a predetermined threshold number.. The 
counter is reset when the system moves on to a 
new cluster. Therefore, if the same cluster is con- 
tinually compared and written to the target, the 
,5 counter will eventually reach the threshold value. 
The system will then reduce the size of the cluster 
and perform the comparison again. Reducing he 
size of the cluster will make it more likely that the 
clusters on the two members will be consistent 
,0 since data in a smaller size cluster is less likely to 
Tchanged by a write from one of the hosts^When 
a successful comparison is eventually achieved, 
the cluster size is restored to its previous value. 
As described above. 1/0 operations to the 
25 shadow set will continue while a "^w metnber is 
being added. However, hosts are required to per- 
form write type operations in a manner that guar- 
antees that while a a new member is being added 
all data written to logical blocks on the target disk 
30 will be identical to those contained on the source 
disk If hosts issue 1/0 commands in parallel, as is 
normally done, it is possible that the data on the 
source and target will not be consistent after the 
CODY method described above is implemented. To 
35 avoid possible data corruption, hosts ^hall ensure 
' that write operations addressed to the source d^k 
are issued and completed before the equivalent 
operation is issued to the target disk. 

As explained above, each host stores a table 
^ that lists data that the host needs to OP^^^^ P'^JJ^ 
eriv in the system. For example, each table will 
Sude information regarding the disks that rn^e 
up the shadow set. etc. The table also stores stattis 
information that informs the host whether or not a 
, nlw member is being added to the shadow set^ 
Therefore, before a host executes an 1/0 request to 
the Shadow set it will check the status Held m its 
table, and if the host determines that a new rnem- 
ber is being added, the host will implement the 
50 special procedures discussed above for avoiding 
'° PCS ibie data corruption. The table is kept cu^ent 
by requiring hosts that begin the process of adding 
a new member, to send a message to every other 
host in the system, informing each host of the 
55 operation. A host that is controlling the addition of 
the new member will not begin the data transfer to 
the new member until it receives a confirmation 
f om each host that each host has updated its table 
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to- reflect the new status of the system. S.mnarly. a 
host controlling the addition of a new member w.11 
send a message to each host when the new mem- 
ber has been added, and has data that .s consis- 
tent with the shadow set. Upon receiving th.s mes- 
sage, hosts will resume the normal practice of 
issuing I/O requests in parallel. 

The method of the invention will now be ex- 
olained in detail, with reference to the flow chart of 
Fia S The host first initializes two counters: a 
-logical cluster counter" and the repeat cluster 
counter (step t). The logical cluster counter is used 
to identify clusters in each shadow set member 
and. when initialized, will identify a first cluster of 
data As the logical cluster counter is incremented, 
it sequentially identifies each cluster in the shadow 
set 

Next, the host selects one active shadow set 
member to serve as the source with the new mem- 
ber serving as the target (step 2). The host then 
issues a read command of the type illustrated m 
Rg 2B to the disk controller serving the source 
(identified by the "unit number" field in the com- 
mand) requesting that the data in the cluster iden- 
tified by the logical cluster counter be read and 
transmitted to host memory (step 3). 

The disk controller serving the source will re- 
ceive the read command, will read the identified 
data from the source to its buffer 22 an/J will 
transmit the data to host memory i2. as well as 
issuing an end message (see Fg. 2B) informing 
the host that the read command was executed 
(step 4). 

After the host receives the end message in- 
dicating that the data from the source is in host 
memory 12 (step 5). the host will issue a Compare 
Host command to the target to compare the data 
read from the source to the data stored m the 
same logical cluster in the target (step 6). 

The target will receive the Compare Host com- 
mand, and will perform the comparison, issuing an 
end message (see Rg. 4B) to the host indicating 
the result of the comparison (step 7). 

The host receives the end message in step 8 
and determines whether the data compared by the 
Compare Host command was identical. If the com- 
pared data was identical, then the logical cluster 
counter is tested to see if it is equal to a predeter- 
mined last number (indicating that all data clusters 
have been processed) (step 9). If the logical cluster 
counter is equal to the last number, then the pro- 
cess is finished. Othewise. the logical cluster 
counter is incremented, the repeat cluster counter 
is reset (step 10). and the method returns to step 3 
to begin processing the next cluster. 

If the compared data was not identical (see 



ier. instructing the controller to write the data read 
from the source and sent to the host in step 4 to 
the section of the target identified by the logical 
cluster counter (i.e.. the cluster in the target that 
5 corresponds to the cluster in the source from which 
the data was read) (step 11). 

The disk controller serving the target receives 
the write command, reads the data from host mem- 
ory and writes it to the section of the target iden- 
,0 tified by the current logical cluster counter and 
issues an end message to the host (step 12). 

The host receives the end message indicating 
that the write has been completed (step 13). incre- 
ments the repeat cluster counter and determines if 
,s the repeat cluster counter is greater than a thresh- 
old value (step 14). As explained above, if the 
same cluster is written a predetermined number of 
times, the repeat cluster counter will reach a cer- 
tain value and the size of the cluster is reduced 

20 .eturns to step 3 without 

updating the logical cluster counter. Since the logi- 
cal cluster counter is not updated, the host will then 
read the same cluster once again from the source 
25 and perform the host compare operation to the 
target for the same cluster. 

The above described embodiment is a pre- 
ferred illustrative enbodiment only and there are 
• various modifications that may be made withm the 
30 spirit of the invention. For example, the method 
may be modified such that, if a new disk being 
added to the shadow set is known to have little Of 
no data that is consistent with data on the active 
members of the shadow set. steps 6-10 can be 
35 eliminated the first time that each cluster is pro- 
cessed such that no Compare Host operation is 
performed. In other words, if the new disk is known 
to contain data that is different than the data stored 
in the shadow set. the result, of the first Compare 
' 40 Host operation will be negative and need not be 
performed. However, if the new disk being added 
was one that was in the shadow set in the recent 
past, then most of its data will be consistent with 
the data in the shadow set. and the Compare Host 
45 operation should be performed the first time. 

Therefore, the invention shall be limited only 
by the scope of the appended claims. 



50 Claims 

,. A method of transferring data from a first storage 
medium to a second storage medium, said first 
storage medium being accessible to one or more 
55 host processors, each of said storage media being 
divided into corresponding data blocks, said meth- 

aH rnmnrisinn t»hfl SteCS Of: 
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said first storage medium, the f.rst data block 
initially cor^stituting a current data blocK; 
b, comparing the data read in said current data 
Sock to data stored in a correspond.ng data 
block in said second storage medium: 

me data compared in step b are .denfcaK 
fellg data stored in a <^-^^-^.^^ '^^J^'J^ 
said first storage medium. sa.d ^ «ejent data 
bfock becoming the current data block, and re- 

turning to Step b; * ^ k ar-o not 

■ (d) if the data compared in step b are no 
denticaL modifying the data stored .n one of 
s^d storage media such that the data in saKl 
cuVoi dai block is identical to the correspond- 
ing data in said second storage '"ed.um: and 
S rereading the data in said current data block 
and returning to step b. 
2 The method of claim 1 where.n sa.d step d 
modifying comprises modifying the data m sa,d 
second storage medium. 

3. The method of claim 2 where.n sa.d step d 
modifying comprises writing said data read from 
^current data block in said first storage med.um 
to the corresponding data block .n sa.d second 

rr:;^od of Claim 1 Wherein said different 
data block is a data block adjacent to sa.d current 

method of claim t wherein each data block 
in said first storage medium is compared to a 
corresponding data block in said second storage 

rr method Of Claim 1 wherein each of said 
storage media are members of a shadow set of 

rSe Sod of Claim 6 wherein each of said 
storage media may be directly accessed by a host 

r^^e'method of claim 6 wherein each of said 

storage media may be directly accessed by each 

of a plurality of host processors. 

9. The method of claim 1 wherein said storage 

media are disk storage devices. 

10 A method of managing a shadow set of storage 
• media accessible by one or more host processors 

for I/O operations, comprising the steps of. 

A carrying out successive comparisons of data 
stored in corresponding locations in a plurality of 
said storage media, respectively: and 
B periorming- a management operation on at 
least one of said storage media, said manage- 
ment operation comprising, for each of said cor- 
responding locations where said compansons 
indicated that the data in said corresponding 
locations was not identical: 
a reading data from locations in one of said 
storage media and- writing said data to cor- 
responding locations in another of said storage 



media: and 

b. comparing the data in said correspondmg 
locations after said writing to determme if the 
data in said corresponding locations is identical. 
5 11. An apparratus for managing a shadow set o 
storage media accessible by one or more host 
processors for I/O operations, comprising: 
means for carrying out successive comparisons o^ 
data stored in corresponding locations in a plurality 
10 of said storage media, respectively; and 

means for performing a management operation on 
at least one of said storage media, said manage- 
ment operation comprising, for each of said cor- 
responding locations where said comparisons in- . 
,s dicated that the data in said corresponding loca- 
tions was not identical: 

a reading data from locations in one of said 
storage media and writing said data to cor- 
responding locations in another of said storage 

20 media; and 

b. comparing the data in said corresponding 
locations after said writing to determine if the 
data in said corresponding locations is identocal. 
12 A program for controlling one or more proces- 
25 sors in a digital computer, the digital computer 
^^cessing at least one process.which enables s^d 
processors to manage a shadow set of storage 
media accessible by one or more host processors 
for I/O operations, said program comprising: 
,0 a comparison module for enabling one of said 
processors to carry out successive comparisons o^ 
data stored in corresponding locations m a plurality 
of said storage media, respectively; and a manage- 
ment module for enabling one of said Processors 
35 to perform a management operation on at least one 
of said storage media, said management operadon 
comprising, for each of said corresponding loca- 
tions where said comparisons indicated that me 
data in said corresponding locations was not iden- 

^ reading data from locations in one of said 

storage media and writing said data to cor- 
responding locations in said other storage me- 

dia; and ^ 
.5 b. comparing the data in said correspond ng 
locations after said writing to determine if tf^e 
data in said-corresponding locations is identical. 

13 The method of claim 1 wherein each of said 
hosts will transmit any write requests to said stor- 

50 age media first to said first storage 'tedium and. 
after said write request to said first storage medium 
has completed, will transmit said write request to 
said second storage medium. 

14 The method of claim 1 wherein each of said 
55 host processors maintains a table including mfor- 

mation relating to said data transfer. 
Ts. The method of claim 10 wherein each of sa^d 
storage media may be directly accessed by each 



EP 0 405 926 A2 12 



of a plurality of tiost processors. 
16. The method of claim 10 wherein each of said 
hosts will transmit any write requests to said stor- 
age media first to said one of said storage media 
and after said write request to said one of said 
storage media has completed, will transmit said 
write request to said another of said storage media. 
17 The method of claim 10 wherein each of said 
host processors maintains a teble including infer- 
mation relating to said management operation. 
ia The method of claim 10 wherein step B(b) • 
comprises rereading said data from said locations 
in. said one of said storage media and comparing 
said reread data to data in said corresponding 
locations in said another of said storage media. 

19 The method of claim 10 wherein steps B(a) and 
B(b) are repeated recursively for the same cor- 
responding locations until the data stored in said 
corresponding locations is determined to be iden- 
tical. 

20 The apparatus of claim 11 wherein each of said 
hosts will transmit any write requests to said stor- 
age media first to said one of said storage media 
and. after said write request to said one of said 
storage media has completed, will transmit said 
write request to said another of said storage media. 

21 The apparatus of claim 11 wherein each of said 
host processors maintains a table including infor- 
mation relating to said management operation. 

22 The apparatus of claim 1 1 wherein each of said 
storage media may be directly accessed by each 
of a plurality of host processors. 

23. The apparatus of claim 11 wherein said man- 
agement operation comprises rereading said data 
from said locations in said one of said storage 
media after said writing and comparing said reread 
data to data in said corresponding locations in said 
another of said storage media. 

24. The apparatus of claim 1 1 wherein said means 
for performing said management operation repeats 
said reading and comparing recursively for the 
same corresponding locations until the data stored 
in said corresponding locations is determined to be 
identical. 
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